The 3W Model and Algebra for Unified Data Mining
نویسندگان
چکیده
Real data mining/analysis applications call for a framework which adequately supports knowledge discovery as a multi-step process, where the input of one mining operation can be the output of another. Previous studies, primarily focusing on fast computation of one speci c mining task at a time, ignore this vital issue. Motivated by this observation, we develop a unied model supporting all major mining and analysis tasks. Our model consists of three distinct worlds, corresponding to intensional and extensional dimensions, and to data sets. The notion of dimension is a centerpiece of the model. Equipped with hierarchies, dimensions integrate the output of seemingly dissimilar mining and analysis operations in a clean manner. We propose an algebra, called the dimension algebra, for manipulating (intensional) dimensions, as well as operators that serve as \bridges" between the worlds. We demonstrate by examples that several real data mining processes can be captured using our model and algebra. We demonstrate the naturality of the algebra by establishing several identities. Finally, we discuss e cient implementation of the proposed framework.
منابع مشابه
Model Selection Based on Tracking Interval Under Unified Hybrid Censored Samples
The aim of statistical modeling is to identify the model that most closely approximates the underlying process. Akaike information criterion (AIC) is commonly used for model selection but the precise value of AIC has no direct interpretation. In this paper we use a normalization of a difference of Akaike criteria in comparing between the two rival models under unified hybrid cens...
متن کاملPresenting a Model for Predicting Tax Evasion of Guilds Based on Data Mining Technique
In this research, considering the importance of the topic and the gap in previous researches, a model for predicting tax evasion of guilds based on data mining technique is presented. The analyzed data includes the review of 5600 tax files of all trades with tax codes in Qazvin province during the years 2013-2018. The tax file related to guilds is in five tax groups, including the guild group o...
متن کاملContext-aware Modeling for Spatio-temporal Data Transmitted from a Wireless Body Sensor Network
Context-aware systems must be interoperable and work across different platforms at any time and in any place. Context data collected from wireless body area networks (WBAN) may be heterogeneous and imperfect, which makes their design and implementation difficult. In this research, we introduce a model which takes the dynamic nature of a context-aware system into consideration. This model is con...
متن کاملA UNIFIED MODEL FOR RESOURCE-CONSTRAINED PROJECT SCHEDULING PROBLEM WITH UNCERTAIN ACTIVITY DURATIONS
In this paper we present a unified (probabilistic/possibilistic) model for resource-constrained project scheduling problem (RCPSP) with uncertain activity durations and a concept of a heuristic approach connected to the theoretical model. It is shown that the uncertainty management can be built into any heuristic algorithm developed to solve RCPSP with deterministic activity durations. The esse...
متن کاملA Unified Approach for Quality Control of Drilled Stem Test (DST) and PVT Data
Finding a representative fluid in a hydrocarbon reservoir is crucial for integrated reservoir management. In this study, a systematic approach for screening and selecting consistent fluid samples in a reservoir was developed. The model integrated quality control (QC) of well conditioning before sampling, QC of PVT data, thermodynamic modeling and compositional gradient within a reservoir. Well ...
متن کامل